Two Easy Improvements to Lexical Weighting

نویسندگان

  • David Chiang
  • Steve DeNeefe
  • Michael Pust
چکیده

We introduce two simple improvements to the lexical weighting features of Koehn, Och, and Marcu (2003) for machine translation: one which smooths the probability of translating word f to word e by simplifying English morphology, and one which conditions it on the kind of training data that f and e co-occurred in. These new variations lead to improvements of up to +0.8 BLEU, with an average improvement of +0.6 BLEU across two language pairs, two genres, and two translation systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The two be's of English

This  qualitative  study  investigates  the  uses  of  be  in  Contemporary  English.  Based  on  this  study, one  easy  claim  and  one  more  difficult  claim  are  proposed.  The  easy  claim  is  that  the  traditional distinction between be as a lexical verb and be as an auxiliary is faulty. In particular, 'copular-be', traditionally considered to be a lexical verb, is in fact a prototypi...

متن کامل

Semantic Feature Analysis Treatment for Anomia of Two Nonfluent Persian-Speaking Aphasic Patients

Objectives: Semantic Feature Analysis was designed to improve lexical retrieval of aphasic patients via activation of semantic networks of the words. In this approach, the anomic patients are cured with semantic information to assist oral naming. The purpose of this study was to examine the effects of Semantic Feature Analysis treatment on anomia of two nonfluent aphasic patients. Methods: A...

متن کامل

Combining lexical and statistical translation evidence for cross-language information retrieval

This paper explores how best to use lexical and statistical translation evidence together for CrossLanguage Information Retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the ...

متن کامل

Workshop Notes of the ECML / MLnet Workshop on Empirical Learning of Natural Language Processing Tasks

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-oo smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can ooer the advantage of automatically specifying a suitable domain-speciic hierarchy between most speciic and...

متن کامل

Topic Models for Dynamic Translation Model Adaptation

We propose an approach that biases machine translation systems toward relevant translations based on topic-specific contexts, where topics are induced in an unsupervised way using topic models; this can be thought of as inducing subcorpora for adaptation without any human annotation. We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorpo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011